NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Agnostic Active Learning of Single Index Models with Linear Sample Complexity

Gajjar, Aarshvi; Tai, Wai_Ming; Xu, Xingyu; Hegde, Chinmay; Musco, Christopher; Li, Yi (June 2024, Proceedings of Machine Learning Research)

We study active learning methods for single index models of the form $$F({\bm x}) = f(\langle {\bm w}, {\bm x}\rangle)$$, where $$f:\mathbb{R} \to \mathbb{R}$$ and $${\bx,\bm w} \in \mathbb{R}^d$$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $$f$$ is known and Lipschitz, we show that $$\tilde{O}(d)$$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent $${O}(d^{2})$$ bound of \cite{gajjar2023active}. Second, we show that $$\tilde{O}(d)$$ samples suffice even in the more difficult setting when $$f$$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions.
more » « less
Full Text Available
Inconsistency of Cross-Validation for Structure Learning in Gaussian Graphical Models

Lyu, Zhao; Tai, Wai_Ming; Kolar, Mladen; Aragam, Bryon (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics)

Despite numerous years of research into the merits and trade-offs of various model selection criteria, obtaining robust results that elucidate the behavior of cross-validation remains a challenging endeavor. In this paper, we highlight the inherent limitations of cross-validation when employed to discern the structure of a Gaussian graphical model. We provide finite-sample bounds on the probability that the Lasso estimator for the neighborhood of a node within a Gaussian graphical model, optimized using a prediction oracle, misidentifies the neighborhood. Our results pertain to both undirected and directed acyclic graphs, encompassing general, sparse covariance structures. To support our theoretical findings, we conduct an empirical investigation of this inconsistency by contrasting our outcomes with other commonly used information criteria through an extensive simulation study. Given that many algorithms designed to learn the structure of graphical models require hyperparameter selection, the precise calibration of this hyperparameter is paramount for accurately estimating the inherent structure. Consequently, our observations shed light on this widely recognized practical challenge.
more » « less
Full Text Available
Optimal estimation of Gaussian (poly)trees

Wang, Yuhao; Gao, Ming; Tai, Wai_Ming; Aragam, Bryon; Bhattacharyya, Arnab (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics)

We develop optimal algorithms for learning undirected Gaussian trees and directed Gaussian polytrees from data. We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery). The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently. The second approach is a modification of the PC algorithm for polytrees that uses partial correlation as a conditional independence tester for constraint-based structure learning. We derive explicit finite-sample guarantees for both approaches, and show that both approaches are optimal by deriving matching lower bounds. Additionally, we conduct numerical experiments to compare the performance of various algorithms, providing further insights and empirical evidence.
more » « less
Full Text Available
Learning mixtures of gaussians with censored data

Aragam, Bryon; Tai, Wai_Ming (July 2023, Proceedings of the 40th International Conference on Machine Learning)

We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finite-sample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$\sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2),$$ i.e. the sample is observed only if it lies inside a set $$S$$. The goal is to learn the weights $$w_i$$ and the means $$\mu_i$$. We propose an algorithm that takes only $$\frac{1}{\varepsilon^{O(k)}}$$ samples to estimate the weights $$w_i$$ and the means $$\mu_i$$ within $$\varepsilon$$ error.
more » « less
Full Text Available
Tight Bounds on the Hardness of Learning Simple Nonparametric Mixtures

Tai, Wai_Ming; Aragam, Bryon (July 2023, Proceedings of Thirty Sixth Conference on Learning Theory)

We study the problem of learning nonparametric distributions in a finite mixture, and establish tight bounds on the sample complexity for learning the component distributions in such models.Namely, we are given i.i.d. samples from a pdf f where f=w1f1+w2f2,w1+w2=1,w1,w2>0 and we are interested in learning each component fi .Without any assumptions on fi , this problem is ill-posed.In order to identify the components fi , we assume that each fi can be written as a convolution of a Gaussian and a compactly supported density νi with supp(ν1)∩supp(ν2)=∅ .Our main result shows that (1ε)Ω(loglog1ε) samples are required for estimating each fi . The proof relies on a quantitative Tauberian theorem that yields a fast rate of approximation with Gaussians, which may be of independent interest. To show this is tight, we also propose an algorithm that uses (1ε)O(loglog1ε) samples to estimate each fi . Unlike existing approaches to learning latent variable models based on moment-matching and tensor methods, our proof instead involves a delicate analysis of an ill-conditioned linear system via orthogonal functions.Combining these bounds, we conclude that the optimal sample complexity of this problem properly lies in between polynomial and exponential, which is not common in learning theory.
more » « less
Full Text Available

Search for: All records